1 - Seek correlation between monthly lagged price per mile and number of Traveling Passengers

Conclusion:

Price purpase on 4, 6, 15, 21 and 30 days prior de departure have the best correlation with number of passengers.

Next, checking those variables for wide enough spread to be significant and avoid overfitting.

2 - Box charts price per mile variation per departure lag

Conclusion:

Keep Day 4, 15, 21 and 30 but prefer Day 6 over Day 7 for its variablility

3 - Building number of passengers predictive model based on ppm for prior booking on Day 4, 5, 7, 21 and 30

## 
## Call:
## lm(formula = Revenue.passengers ~ Day_4 + Day_6 + Day_15 + Day_21 + 
##     Day_30, data = dfa)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -334075  -82093  -19457  122307  345473 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  1893414     530668   3.568  0.00257 **
## Day_4        -182728    1389904  -0.131  0.89704   
## Day_6       -2893690    1974663  -1.465  0.16218   
## Day_15       7766905    5370514   1.446  0.16742   
## Day_21      -2909230    5940073  -0.490  0.63095   
## Day_30       4281765    2816676   1.520  0.14799   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 193600 on 16 degrees of freedom
## Multiple R-squared:  0.6089, Adjusted R-squared:  0.4867 
## F-statistic: 4.983 on 5 and 16 DF,  p-value: 0.006102

## (Intercept)       Day_4       Day_6      Day_15      Day_21      Day_30 
##   1893413.5   -182727.6  -2893690.0   7766904.6  -2909229.5   4281764.6

Conclusion:

Model plotted above has a very significant predictive model. Next step predicting Quarterly Revenue. Note: It is possible that this airline has a very predictive seasonal traffic that would explain such a blattant correlation.

4 - Predicting Quarterly Revenue from price per mile variation

dfa <- try(read.csv( paste0(basepath,"/data/B6_quarterly_finance_ppm.csv") , header=TRUE, sep = "," ,stringsAsFactors=FALSE) )

dfa$Revenue <- as.numeric(as.character(dfa$Revenue))
dfa$Quarter_ending <- as.Date(dfa$Quarter_ending, "%m/%d/%y")
#pairs(dfa[,c(2,4,5,8,9,10,11,13,15,16,17)], pch = 19)

fit <- lm(Revenue ~ Day_4+Day_7+Day_15+Day_30+Day_21, data = dfa )
summary(fit)
## 
## Call:
## lm(formula = Revenue ~ Day_4 + Day_7 + Day_15 + Day_30 + Day_21, 
##     data = dfa)
## 
## Residuals:
##        1        2        3        4        5        6        7        8 
##  -7.6096   0.6027   1.2713 -38.6544  25.8094  32.4321  -8.1616  -5.6899 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    802.1      409.0   1.961    0.189
## Day_4         1887.4      850.0   2.221    0.157
## Day_7        -2303.9     3878.9  -0.594    0.613
## Day_15        3931.8     4169.5   0.943    0.445
## Day_30        1028.4     1222.0   0.842    0.489
## Day_21       -2710.1     3311.9  -0.818    0.499
## 
## Residual standard error: 41.05 on 2 degrees of freedom
## Multiple R-squared:  0.9136, Adjusted R-squared:  0.6977 
## F-statistic: 4.231 on 5 and 2 DF,  p-value: 0.2022
pre <- predict( fit, dfa) 
dfa$pre <- pre
acu <- data.frame(cbind(actuals=dfa$Revenue, predicteds=pre))

acu
##   actuals predicteds
## 1    1478   1485.610
## 2    1487   1486.397
## 3    1571   1569.729
## 4    1477   1515.654
## 5    1451   1425.191
## 6    1650   1617.568
## 7    1623   1631.162
## 8    1564   1569.690
plot(acu)

coef(fit)
## (Intercept)       Day_4       Day_7      Day_15      Day_30      Day_21 
##     802.076    1887.418   -2303.878    3931.789    1028.374   -2710.069
bin<-hexbin(dfa$Revenue, pre, xbins=3) 
plot(bin, main="Hexagonal Binning")

Conclusion:

PRETTY GOOD, NO?

5 - Visual confirmation chart

Chart below is representing normalized (around their means) the fields: revenue, prediction, day 0 to 180

Note: Time serie is per quarter… January 2018 is in fact 2017 Q4 …